Use actual AI SDK token usage for compression and fix pricing lookup#2803
Use actual AI SDK token usage for compression and fix pricing lookup#2803tim-inkeep wants to merge 38 commits intomainfrom
Conversation
Add append-only usage_events table for tracking LLM generation usage across all call sites. Includes token counts (input, output, reasoning, cached), dynamic pricing cost estimate, generation type classification, and OTel correlation fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two-tier dynamic pricing: gateway getAvailableModels() as primary (when AI_GATEWAY_API_KEY is set), models.dev API as universal fallback. In-memory cache with periodic refresh (1h gateway, 6h models.dev). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Insert, query (paginated), and summary aggregation functions for usage_events table. Supports groupBy model/agent/day/generation_type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
recordUsage() extracts tokens from AI SDK responses, looks up pricing, sets OTel span attributes, and fire-and-forgets a usage_event insert. New SPAN_KEYS: total_tokens, reasoning_tokens, cached_read_tokens, response.model, cost.estimated_usd, generation.step_count, generation.type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add usage, totalUsage, and response fields to ResolvedGenerationResponse. resolveGenerationResponse now resolves these Promise-based getters from the AI SDK alongside steps/text/finishReason/output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call recordUsage() after resolveGenerationResponse in runGenerate(), capturing tenant/project/agent/subAgent context, model, streaming status, and finish reason. Fire-and-forget, non-blocking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add recordUsage() calls for status_update and artifact_metadata generation types in AgentSession. Compression call sites deferred (need context threading through function signatures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate estimateTokens() and AssembleResult into packages/agents-core/src/utils/token-estimator.ts. Update all 10 import sites in agents-api to use @inkeep/agents-core. Removes duplicate code and prepares for usage tracker integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace recordUsage() with trackedGenerate() — wraps generateText/ streamText calls to automatically record usage on success AND failure. Failed calls check error type: 429/network = 0 tokens, other errors = estimated input tokens from prompt. All call sites (generate.ts, AgentSession status updates + artifact metadata, EvaluationService simulation) now use the wrapper consistently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GET /manage/v1/usage/summary — aggregated usage by model/agent/day/ generation_type with optional projectId filter. GET /manage/v1/usage/events — paginated individual usage events with filters for project, agent, model, generation type. Both enforce tenant auth with project-level access checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tenant-level usage dashboard at /{tenantId}/usage with:
- Summary stats: total tokens, estimated cost, generation count, models
- Token usage over time chart (daily buckets via AreaChartCard)
- Breakdown tables by model and generation type
- Project filter and date range picker
- Nav item added to sidebar
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract UsageDashboard, UsageStatCards, UsageBreakdownTable into
reusable component. Both tenant-level (/{tenantId}/usage) and
project-level (/{tenantId}/projects/{projectId}/usage) pages import
the shared component. Register Usage tag in OpenAPI spec + docs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Route handlers use c.get('tenantId') from middleware context
- Client fetches through /api/usage Next.js proxy (forwards cookies)
- Initialize PricingService at server startup for cost estimation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolvedModel from the AI SDK doesn't include provider prefix (e.g. 'claude-sonnet-4-6' not 'anthropic/claude-sonnet-4-6'). Parse requestedModel once at the top and use the extracted modelName for pricing lookup, falling back to resolvedModel when available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cking Data layer: - Add steps JSONB column for per-step token breakdown - Populate traceId/spanId from active OTel span - Add conversation/message groupBy + conversationId filter - Thread agentId/conversationId through compression call chain - Wrap compression generateText calls with trackedGenerate Traces integration: - Conversation detail route fetches usage events and merges cost into activities by spanId (with parentSpanId fallback) - Cost shows on timeline items and span detail panels - Usage Cost card on conversation detail page UI: - Events table with pagination, trace links, agent/sub-agent columns - 50/50 chart + events layout - conversationId filter in usage API client Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write path: - Remove Postgres insert from persistEvent, keep OTel span attributes - Add all schema fields as span attributes (requested_model, provider, status, streamed, byok, finish_reason, duration_ms, error_code, message_id) - Add UsageCostSpanProcessor that enriches doGenerate/doStream spans with per-step cost from PricingService before export - Standardize experimental_telemetry.metadata.generationType across all call sites (status_update, artifact_metadata, compression) - Extract USAGE_GENERATION_TYPES constant for shared filter Read path: - Add getUsageCostSummary (SigNoz aggregation by model/agent/type/conversation) - Add getUsageEventsList (SigNoz LIST query for individual spans) - Dashboard fetches from SigNoz via existing signoz-stats client - Events table and chart derived from SigNoz span data - Filter to only spans with known generationType to reduce noise trackedGenerate no longer takes db parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
🦋 Changeset detectedLatest commit: 2eb1617 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@pullfrog, please review my PR |
There was a problem hiding this comment.
Solid PR — the shift from heuristic token estimation to actual AI SDK usage data is well-motivated and the implementation is clean. The wrapLanguageModel approach to fix pricing lookups is elegant. Two medium-severity issues (potential PricingService interval leak, v3 middleware spec version risk) and a few minor items below.
| if (this.initialized) return; | ||
| this.initialized = true; | ||
|
|
||
| await Promise.allSettled([this.refreshGateway(), this.refreshModelsDev()]); |
There was a problem hiding this comment.
Medium: initialize() is not idempotent against concurrent callers. The initialized guard prevents re-entry but uses a synchronous boolean — if two callers race past the guard before the first sets this.initialized = true, both will set up duplicate intervals. Consider storing the init promise:
private initPromise: Promise<void> | null = null;
async initialize(): Promise<void> {
if (!this.initPromise) {
this.initPromise = this.doInitialize();
}
return this.initPromise;
}Alternatively, guard the interval creation behind this.gatewayInterval == null to be safe.
| if (this.modelsDevInterval) clearInterval(this.modelsDevInterval); | ||
| this.gatewayInterval = null; | ||
| this.modelsDevInterval = null; | ||
| this.initialized = false; |
There was a problem hiding this comment.
Minor: destroy() does not clear initPromise / caches. If someone calls destroy() then initialize() again, this.initialized is false but the caches still contain stale data from the previous lifecycle. Not blocking — the singletons are long-lived in practice — but worth noting for test hygiene.
| } | ||
|
|
||
| export const usageCostMiddleware: LanguageModelMiddleware = { | ||
| specificationVersion: 'v3', |
There was a problem hiding this comment.
Medium: specificationVersion: 'v3' ties this to an unreleased/experimental middleware API version. If the AI SDK ships a breaking change to the v3 spec (usage shape, callback signatures), this will silently break cost tracking. Confirm this version is stable in the ai package version pinned in your lockfile. If not, add a comment noting the version dependency.
| const result = await doGenerate(); | ||
|
|
||
| try { | ||
| const inputTokens = result.usage.inputTokens.total ?? 0; |
There was a problem hiding this comment.
Minor: result.usage.inputTokens.total assumes a nested .total property. This matches the v3 spec's structured usage shape, but the old v1/v2 shape used flat inputTokens: number. If any codepath bypasses wrapLanguageModel and hits this middleware with the old shape, it will throw. The try/catch on line 77 guards against this, so it's safe — just noting the implicit contract.
| `To access other models, use OpenRouter (openrouter/model-id), Vercel AI Gateway (gateway/model-id), NVIDIA NIM (nim/model-id), or Custom OpenAI-compatible (custom/model-id).` | ||
| ); | ||
| } | ||
| return wrapLanguageModel({ |
There was a problem hiding this comment.
The modelId: modelString here passes the full provider/model-name string (e.g. anthropic/claude-sonnet-4). This is what calculateAndSetCost receives as modelId, and then it splits on / to extract the model name when providerId is present (line 29 of usage-cost-middleware.ts). This works correctly — just confirming the data flow is intentional since the middleware does its own parsing.
| if (hasReliableUsage) { | ||
| // Use actual token counts from the last completed step | ||
| // Next step's context ≈ last step's input + last step's output (assistant response appended) | ||
| totalTokens = actualInputTokens + (actualOutputTokens ?? 0); |
There was a problem hiding this comment.
Correctness check: totalTokens = actualInputTokens + (actualOutputTokens ?? 0) approximates the next step's context size as "last input + last output". This is a good heuristic but slightly oversimplifies — the output gets appended as a new assistant message, so the actual input for the next step includes the original context plus the output tokens, which is what inputTokens already captures for the current step. So the formula effectively double-counts the prior context. In practice this is conservative (triggers compression earlier), which is arguably safer. Worth documenting the rationale.
| safetyBuffer, | ||
| triggerAt, | ||
| remaining: hardLimit - totalTokens, | ||
| source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated', |
There was a problem hiding this comment.
Nit: source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated' — at this point in the code, we're inside the compressionNeeded branch. The source was already determined above, but this ternary re-derives it from steps.length which doesn't account for the hasReliableUsage check (e.g. steps.length > 0 but inputTokens was 0 → fell back to estimate). Consider using a local source variable set at the decision point.
| // USAGE GENERATION TYPES (table removed — usage now tracked via OTel/SigNoz) | ||
| // ============================================================================ | ||
|
|
||
| import { USAGE_GENERATION_TYPES } from '../../constants/otel-attributes'; |
There was a problem hiding this comment.
Importing from ../../constants/otel-attributes inside a schema file is a bit unusual — it creates a dependency from the DB schema layer to the telemetry constants layer. Since this is just a type re-export and the comment says "table removed — usage now tracked via OTel/SigNoz", it makes sense, but consider whether USAGE_GENERATION_TYPES + GenerationType belong in otel-attributes.ts or in a shared usage-types.ts to keep the schema file focused on DB concerns.
| }), | ||
| }; | ||
|
|
||
| const result = await generateText(genConfig as Parameters<typeof generateText>[0]); |
There was a problem hiding this comment.
The as Parameters<typeof generateText>[0] cast here and in several other places (AgentSession.ts, EvaluationService.ts) suggests the config object doesn't naturally satisfy the generateText parameter type. This is a known pattern when building configs incrementally, but the number of casts in this PR is growing. Not blocking — just flagging for awareness.
| const MODEL_ALIASES: Record<string, string[]> = { | ||
| 'claude-sonnet-4': ['claude-sonnet-4'], | ||
| 'claude-opus-4': ['claude-opus-4'], | ||
| 'claude-haiku-3.5': ['claude-3-5-haiku', 'claude-3.5-haiku'], | ||
| 'claude-sonnet-3.5': ['claude-3-5-sonnet', 'claude-3.5-sonnet'], | ||
| 'claude-opus-3': ['claude-3-opus'], | ||
| 'claude-haiku-3': ['claude-3-haiku'], | ||
| }; |
There was a problem hiding this comment.
The alias map is Anthropic-only right now. OpenAI, Google, and other providers have similar aliasing needs (e.g. gpt-4o vs gpt-4o-2024-08-06). This is fine as a starting point — the stripDateSuffix regex handles the most common case — but the map will need expansion as users hit pricing misses for other providers.
|
TL;DR — Replaces the inaccurate character-based token estimation heuristic with actual AI SDK token usage for mid-generation compression decisions, adds a Key changes
Summary | 46 files | 22 commits | base: Actual token usage for mid-generation compression
The new
Pricing service and cost middleware
Enriched OTEL span attributes
The
Usage Dashboard UI
Data is fetched from SigNoz via two new query methods —
|
|
TL;DR — Replaces the Key changes
Summary | 46 files | 21 commits | base: Actual SDK token counts for compression decisions
The
Pricing service and cost middleware
OTel instrumentation and generation telemetry
All generation call sites —
Usage dashboard UI and SigNoz API integration
|
There was a problem hiding this comment.
PR Review Summary
(2) Total Issues | Risk: Low
This is a delta review covering commits since the last automated approval (2140a3678). The core feature implementation has been extensively reviewed (9 prior automated reviews) with all Critical and Major issues addressed.
🟡 Minor (1) 🟡
🟡 1) cost/page.tsx:31 Project filter state not persisted in URL
Issue: The tenant-level cost page's project filter uses local React state (useState) rather than URL query params. The filter is lost on page refresh/navigation.
Why: Flagged by Ito tests: "Tenant cost project filter state is not persisted across refresh/history". Users sharing URLs or refreshing will see the filter reset to "All projects".
Fix: Use useQueryState from nuqs to persist the project filter in the URL.
Refs: See inline comment on agents-manage-ui/src/app/[tenantId]/cost/page.tsx:31
Inline Comments:
- 🟡 Minor:
cost/page.tsx:31Project filter not persisted in URL
💭 Consider (1) 💭
💭 1) otel-attributes.ts:86-105 Unused SPAN_KEYS constants
Issue: 14 SPAN_KEYS constants in the GEN_AI_* group are defined but never referenced in the codebase. Only 4 are actually used (GEN_AI_USAGE_INPUT_TOKENS, GEN_AI_USAGE_OUTPUT_TOKENS, GEN_AI_COST_ESTIMATED_USD, GEN_AI_COST_PRICING_UNAVAILABLE).
Why: As @shagun-singh-inkeep noted, this adds ~60 lines of dead code. Consider removing unused constants or adding comments explaining future intent.
Refs: See inline comment on packages/agents-core/src/constants/otel-attributes.ts:105
✅ Human Reviewer Feedback Addressed
| Feedback | Status |
|---|---|
@shagun-singh-inkeep: Use SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE instead of hardcoded string |
✅ Fixed in usage-cost-middleware.ts:84 |
@shagun-singh-inkeep: Remove dead groupBy === 'day' code |
✅ Fixed in signoz-stats.ts - 'day' option removed from type union |
| @shagun-singh-inkeep: Unused SPAN_KEYS are dead code |
✅ Prior Critical/Major Issues Resolved
All Critical and Major issues from prior review cycles have been addressed:
- ✅ External HTTP timeout added to models.dev fetch
- ✅
extractUsageTokens()helper handles both nested and flat usage shapes - ✅
initPromisepattern ensures idempotent initialization - ✅ Serverless optimization with on-access refresh (no
setInterval) - ✅
USAGE_GENERATION_TYPESconstant used consistently - ✅ Comprehensive test coverage for pricing-service and usage-cost-middleware
🧹 While You're Here (Ito Test Observations)
The Ito test suite found 2 pre-existing issues (not introduced by this PR):
-
Generate-render APIs return 500 for 403/404 errors — Both artifact and data component render routes catch upstream
ApiErrorbut always return 500, obscuring access-denied semantics. -
Conversation trace API has unsanitized query construction — The
conversationIdis interpolated directly into SigNoz filter expressions without validation.
These are worth tracking but out of scope for this PR.
✅ APPROVE
Summary: This PR is in excellent shape after 9 review iterations. All Critical and Major issues have been addressed. The core implementation — using actual AI SDK token counts for compression, PricingService with dual-source lookup, cost middleware, and cost dashboard — is well-tested and production-ready. Two minor items remain (project filter URL persistence and unused SPAN_KEYS), both non-blocking. The serverless optimization concern from @robert-inkeep has been properly addressed with the lazy/stale-while-revalidate pattern. Ready to ship! 🚀
Discarded (0)
No findings discarded — all prior issues verified as resolved.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator |
4 | 1 | 1 | 0 | 2 | 0 | 0 |
| Total | 4 | 1 | 1 | 0 | 2 | 0 | 0 |
Note: Delta review focused on human reviewer feedback and Ito test failures. All prior automated review findings verified as resolved.
| GEN_AI_GENERATION_BYOK: 'gen_ai.generation.byok', | ||
| GEN_AI_GENERATION_STREAMED: 'gen_ai.generation.streamed', | ||
| GEN_AI_MESSAGE_ID: 'gen_ai.message_id', | ||
|
|
There was a problem hiding this comment.
💭 Consider: Unused SPAN_KEYS constants
Issue: As @shagun-singh-inkeep noted, many of the newly added GEN_AI_* SPAN_KEYS are defined but not referenced anywhere in the codebase:
GEN_AI_USAGE_REASONING_TOKENSGEN_AI_USAGE_CACHED_READ_TOKENSGEN_AI_RESPONSE_MODELGEN_AI_GENERATION_STEP_COUNTGEN_AI_GENERATION_TYPEGEN_AI_REQUESTED_MODELGEN_AI_PROVIDERGEN_AI_GENERATION_STATUSGEN_AI_GENERATION_DURATION_MSGEN_AI_GENERATION_FINISH_REASONGEN_AI_GENERATION_ERROR_CODEGEN_AI_GENERATION_BYOKGEN_AI_GENERATION_STREAMEDGEN_AI_MESSAGE_ID
Why: These constants were added for future OTel instrumentation but are currently unused. They add ~60 lines of dead code that could confuse future contributors.
Fix: Either:
- Remove the unused constants now and add them when instrumentation is implemented
- Keep them with a
// Future: will be set by X instrumentationcomment explaining the intent
The only constants actually being used from this group are:
GEN_AI_USAGE_INPUT_TOKENSGEN_AI_USAGE_OUTPUT_TOKENSGEN_AI_COST_ESTIMATED_USDGEN_AI_COST_PRICING_UNAVAILABLE
Refs:
| } = useTracesQueryState(); | ||
|
|
||
| const [projects, setProjects] = useState<Project[]>([]); | ||
| const [selectedProjectId, setSelectedProjectId] = useState<string | undefined>(undefined); |
There was a problem hiding this comment.
🟡 Minor: Project filter state not persisted in URL
Issue: The selectedProjectId state is stored only in React state (useState) and not in URL query params. This means the project filter is lost on page refresh or browser back/forward navigation.
Why: Users who select a project filter and then refresh the page (or share the URL) will see the filter reset to "All projects". The Ito test suite flagged this as a regression: "Tenant cost project filter state is not persisted across refresh/history".
Fix: Add projectId to the useTracesQueryState() hook or use a separate useQueryState for the project filter:
// Option 1: Add to useTracesQueryState hook
const { projectId, setProjectId } = useTracesQueryState();
// Option 2: Use nuqs directly
import { parseAsString, useQueryState } from 'nuqs';
const [selectedProjectId, setSelectedProjectId] = useQueryState('projectId', parseAsString);Refs:
- Ito test failure - "Tenant cost project filter state is not persisted across refresh/history"
- use-traces-query-state.ts - existing pattern for URL state persistence
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a delta review covering commits since the last automated review (review ID 4001789103, 2026-03-24T19:31:17Z).
✅ All Prior Issues Resolved
All issues from the 10 prior automated review cycles have been addressed:
| Category | Issue | Status |
|---|---|---|
| 🔴 Critical | External HTTP call without timeout | ✅ Fixed: AbortSignal.timeout(10_000) added |
| 🟠 Major | Nested property access may throw TypeError | ✅ Fixed: extractUsageTokens() handles both nested and flat usage shapes |
| 🟠 Major | Silent catch swallows model parsing errors | ✅ Fixed: Proper error handling with fallback to unknown provider |
| 🟠 Major | Hardcoded generation types missing component_render |
✅ Fixed: Now uses USAGE_GENERATION_TYPES constant |
| 🟡 Minor | Project filter state not persisted in URL | ✅ Fixed: Now uses useQueryState('projectId', parseAsString) |
✅ Human Reviewer Feedback Addressed
| Reviewer | Feedback | Status |
|---|---|---|
| @shagun-singh-inkeep | Remove dead groupBy === 'day' code |
✅ Removed from signoz-stats.ts |
| @shagun-singh-inkeep | Use SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE instead of hardcoded string |
✅ Fixed in usage-cost-middleware.ts:84 |
| @shagun-singh-inkeep | Unused SPAN_KEYS are dead code | ✅ Noted — only 4 GEN_AI_* constants remain, all are now used |
Implementation Quality
The PR implements end-to-end LLM cost tracking with solid patterns:
- PricingService: Dual-source lookup (AI Gateway + models.dev) with on-access refresh (serverless-optimized, no
setInterval) - usageCostMiddleware: AI SDK v3 middleware that extracts both nested and flat usage shapes
- Cost Dashboard: URL-persisted filters, time range presets, project filtering
- Test Coverage: Comprehensive tests for
pricing-service.tsandusage-cost-middleware.ts
✅ APPROVE
Summary: After 10 automated review cycles and addressing all human reviewer feedback, this PR is production-ready. The core implementation — using actual AI SDK token counts for compression, PricingService with dual-source lookup, cost middleware with proper error handling, and cost dashboard with URL state persistence — is well-tested and follows best practices. Ship it! 🚀
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (delta) |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: Delta review verified all prior findings addressed. No new issues found.
…ent render events on this dashboard
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a delta review covering 5 commits since the last automated review (2140a3678).
✅ All Prior Issues Resolved
All Critical, Major, and Minor issues from the 11 prior automated review cycles have been addressed:
| Issue | Status |
|---|---|
| 🔴 External HTTP call without timeout | ✅ Fixed — AbortSignal.timeout(10_000) added to models.dev fetch |
| 🟠 Nested property access may throw TypeError | ✅ Fixed — extractUsageTokens() helper handles both nested and flat usage shapes |
🟠 initialize() not idempotent |
✅ Fixed — initPromise pattern ensures single initialization |
| 🟠 Serverless optimization | ✅ Fixed — On-access refresh strategy, no setInterval timers |
| 🟡 Project filter state not persisted in URL | ✅ Fixed — Now uses useQueryState('projectId', parseAsString) |
| 🟡 Unused SPAN_KEYS constants | ✅ Fixed — Trimmed to only 4 used GEN_AI_* constants |
🟡 Hardcoded gen_ai.cost.pricing_unavailable string |
✅ Fixed — Uses SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE |
Delta Changes Reviewed
| Commit | Summary | Assessment |
|---|---|---|
026772a45 |
Removed inkeep-agents-manage-ui from service name filter |
✅ Intentional — component render events won't be tracked in cost dashboard |
57519243a |
Trimmed unused SPAN_KEYS, updated projectId persistence | ✅ Addresses @shagun-singh-inkeep's feedback |
f29e54b72 |
Major signoz-stats refactoring | ✅ Cleaner query construction |
7308c05cb, a2994de20 |
Merge commits from main | ✅ Unrelated to usage-tracker feature |
Implementation Quality ✅
The PR implements end-to-end LLM cost tracking with solid patterns:
- PricingService: Dual-source lookup (AI Gateway + models.dev) with lazy initialization and on-access refresh (serverless-optimized)
- usageCostMiddleware: AI SDK v3 middleware with
extractUsageTokens()handling both nested and flat usage shapes - Cost Dashboard: URL-persisted filters (
useQueryState), time range presets, project filtering - Test Coverage: Comprehensive tests for
pricing-service.ts(285 lines) andusage-cost-middleware.ts(296 lines) - OTEL Attributes: Clean constant organization with only used keys exported
Test Coverage Verified
Both new utility files have comprehensive test coverage:
| File | Tests | Coverage |
|---|---|---|
pricing-service.test.ts |
15 tests | calculateCost, getModelPricing, initialize idempotency, date suffix stripping, error handling, destroy lifecycle |
usage-cost-middleware.test.ts |
14 tests | v3 structured usage, flat usage, pricing_unavailable attribute, no-span scenarios, model ID parsing |
✅ APPROVE
Summary: After 11 review iterations and comprehensive human reviewer feedback from @shagun-singh-inkeep, this PR is production-ready. All Critical, Major, and Minor issues have been properly addressed. The implementation follows best practices for:
- Serverless optimization — Lazy init with stale-while-revalidate refresh pattern
- Error handling — Graceful degradation when pricing unavailable
- Type safety — Handles both AI SDK v3 structured and flat usage shapes
- Observability — Proper OTEL span attributes for cost tracking
- UI/UX — URL-persisted filters for shareable dashboard state
Ship it! 🚀
Note: Unable to submit formal GitHub approval due to permission constraints — this is a COMMENT review with approval recommendation.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (delta) |
5 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: Delta review verified all prior findings addressed. No new issues found.
Ito Test Report ✅15 test cases ran. 15 passed. The unified QA run passed all 15/15 test cases with zero failures, confirming stable behavior across cost, AI-calls, conversation traces, and component render-generation flows in local non-production testing. Key findings were that access controls and deep-link login gating worked correctly, filtering/time-range interactions (including empty and future ranges plus rapid toggling) remained coherent without crashes, usage/cost totals and timeline estimated costs were accurate, mobile cost pages and trace navigation were usable, and security checks (cross-project tampering, malformed IDs, and query-parameter XSS payloads) produced safe denied/inert outcomes with no data leakage or backend internals exposure. ✅ Passed (15)Commit: Tell us how we did: Give Ito Feedback |
















Overview
This branch adds end-to-end LLM usage tracking and cost estimation to the Inkeep agent platform.
Previously, token counts from the AI SDK were discarded, so API responses returned hardcoded zeros. Now, every LLM generation across the system is instrumented with token counts, cost estimates, and rich OTEL attributes, all queryable through a new dashboard in the Manage UI.
All usage data lives in SigNoz spans, and the dashboard queries SigNoz directly via its ClickHouse-backed trace API.
1. Model Wrapper:
usageCostMiddlewareFile:
packages/agents-core/src/utils/usage-cost-middleware.ts(new)A Vercel AI SDK
LanguageModelMiddlewarethat intercepts every LLM call (streaming and non-streaming) to calculate costs and set OTEL attributes.How it works
doGenerate(non-streaming) anddoStream(streaming) pathsinputTokensoutputTokensreasoningTokenscachedReadTokenscachedWriteTokensPricingService.getModelPricing(modelName, provider)to look up per-token pricesPricingService.calculateCost(tokenUsage, pricing)gen_ai.cost.estimated_usdon the active OTEL span, or setsgen_ai.cost.pricing_unavailable = trueif pricing is not foundApplied in
ModelFactory.createModel()now automatically wraps every model created through the factory:This means all LLM calls in the system get cost tracking automatically, regardless of call site.
2. Pricing Service
File:
packages/agents-core/src/utils/pricing-service.ts(new)A singleton service initialized on API startup (
agents-api/src/index.ts) with two-tier pricing lookup and periodic refresh.Pricing sources
@ai-sdk/gateway.getAvailableModels())AI_GATEWAY_API_KEYmodels.devAPI (https://models.dev/api.json)Cost calculation support
The service calculates cost across 5 token types:
inputPerTokenoutputPerTokencachedReadPerTokencachedWritePerTokenreasoningPerToken3. OTEL Attributes Added
File:
packages/agents-core/src/constants/otel-attributes.ts(extended)New span attributes are now set across all generation spans:
gen_ai.cost.estimated_usdfloatusageCostMiddlewaregen_ai.cost.pricing_unavailablebooleanusageCostMiddlewaregen_ai.usage.input_tokensintgen_ai.usage.output_tokensintgen_ai.usage.total_tokensintgen_ai.usage.reasoning_tokensintgen_ai.usage.cached_read_tokensintgen_ai.generation.typestringexperimental_telemetry.metadata.generationTypegen_ai.generation.step_countintgen_ai.generation.statusstringgen_ai.generation.duration_msintgen_ai.generation.finish_reasonstringgen_ai.generation.streamedbooleangen_ai.generation.byokbooleangen_ai.requested_modelstringgen_ai.providerstringgen_ai.response.modelstringgen_ai.message_idstringcontext.breakdown.*intgenerate.ts(12 sub-attributes for token breakdown)Generation type constants
4. All LLM Call Sites and Generation Types
Every LLM call site now passes
generationTypethroughexperimental_telemetry.metadata, along with scoping IDs (tenantId,projectId,agentId,subAgentId,conversationId).generationTypegenerate.tssub_agent_generationstreamText()generate.tssub_agent_generationgenerateText()AgentSession.ts:1124status_updategenerateText()withOutput.object()AgentSession.ts:1687artifact_metadatagenerateText()withOutput.object()distill-utils.tsgenerateText()withOutput.object()BaseCompressor.ts -> distillConversation()mid_generation_compressiondistill-utilsConversationCompressor.ts -> distillConversationHistory()conversation_compressiondistill-utilsdistill-conversation-tool.tsmid_generation_compressiondistill-utilsdistill-conversation-history-tool.tsconversation_compressiondistill-utilsEvaluationService.tseval_simulationEvaluationService.tseval_scoringdata-components/.../generate-render/route.tscomponent_renderstreamText()Before this branch
Only call sites 1–2 had any telemetry (
operation: 'generate'). All others had no generation type, no scoping IDs, and no cost tracking.After this branch
All 12 call sites now emit:
generationTypein telemetry metadatatenantId,projectId,agentId,subAgentId,conversationId)5. Usage Dashboard
Files:
agents-manage-ui/src/components/cost/cost-dashboard.tsx(new, shared component)agents-manage-ui/src/app/[tenantId]/cost/page.tsx(new, tenant-level page)agents-manage-ui/src/app/[tenantId]/projects/[projectId]/cost/page.tsx(new, project-level page)agents-manage-ui/src/lib/api/signoz-stats.ts(extended with usage query methods)How it queries data
The dashboard queries SigNoz's trace API (ClickHouse-backed) directly — no intermediate database table.
It filters for spans where:
generateTextorstreamText(AI SDK operations)ai.telemetry.generation_typeis one of the 8 valid generation typesproject.idThree parallel queries on page load
By Model
sum(input_tokens)sum(output_tokens)sum(cost)count()gen_ai.model.idBy Generation Type
ai.telemetry.generation_typeEvents List
Visualizations
Filters and navigation
24h,7d,15d,30d, or custom date range6. Other Notable Changes
agents-apitoagents-core(packages/agents-core/src/utils/token-estimator.ts) so it can be shared across packagesai-sdk-callbacks.tsenhanced:steps[N].usage.inputTokens) instead of always estimatingGenerationTypetype export derived fromGENERATION_TYPES